trimmed square
Iterative Least Trimmed Squares for Mixed Linear Regression
Given a linear regression setting, Iterative Least Trimmed Squares (ILTS) involves alternating between (a) selecting the subset of samples with lowest current loss, and (b) re-fitting the linear model only on that subset. Both steps are very fast and simple. In this paper, we analyze ILTS in the setting of mixed linear regression with corruptions (MLR-C). We first establish deterministic conditions (on the features etc.) under which the ILTS iterate converges linearly to the closest mixture component. We also provide a global algorithm that uses ILTS as a subroutine, to fully solve mixed linear regressions with corruptions. We then evaluate it for the widely studied setting of isotropic Gaussian features, and establish that we match or better existing results in terms of sample complexity. Finally, we provide an ODE analysis for a gradient-descent variant of ILTS that has optimal time complexity. Our results provide initial theoretical evidence that iteratively fitting to the best subset of samples -- a potentially widely applicable idea -- can provably provide state of the art performance in bad training data settings.
Reviews: Iterative Least Trimmed Squares for Mixed Linear Regression
The paper considers the problem of mixed linear regression: in this problem, an algorithm is given access to n data samples (x_i, y_i) with possibly corrupted labels, where each y_i is one of the m possible linear functions of x_i, i.e., y_i x_i T theta_j for j in {1,...,m} (but the algorithm does not know which one). The goal of the algorithm is to determine vectors theta_1,..., theta_m. A straightforward but computationally inefficient (the complexity is exponential in d) approach to solving this problem is by using Least Trimmed Squares (LTS), which tries to identify the best fit vector (in terms of least squares) over all possible subsets of the data points of a particular, predefined size. To address this issue, the paper proposes using an alternative, simple, algorithm called Iterative Least Trimmed Squares (ILTS), which is similar to other algorithms that have been used for related problems in the literature, as acknowledged in the paper. The algorithm is essentially alternating minimization: it alternates between (1) finding the best set of a given size tau * n, given the least squares solution from the previous iteration and (2) solving least squares over the set determined in (1).
Reviews: Iterative Least Trimmed Squares for Mixed Linear Regression
This paper studies mixed linear regression and give a number of results. Under various deterministic conditions, they show that given a sufficiently warm start, iterative trimmed least squares converges to the true directions quickly. Their algorithm continues to work in the presence of adversarial corruptions. However the warm start is required to be quite close to the true solution. They give an SVD based initialization procedure that works in the non-noisy setting and when the examples come from a gaussian distribution.
Iterative Least Trimmed Squares for Mixed Linear Regression
Given a linear regression setting, Iterative Least Trimmed Squares (ILTS) involves alternating between (a) selecting the subset of samples with lowest current loss, and (b) re-fitting the linear model only on that subset. Both steps are very fast and simple. In this paper, we analyze ILTS in the setting of mixed linear regression with corruptions (MLR-C). We first establish deterministic conditions (on the features etc.) under which the ILTS iterate converges linearly to the closest mixture component. We also provide a global algorithm that uses ILTS as a subroutine, to fully solve mixed linear regressions with corruptions.
Iterative Least Trimmed Squares for Mixed Linear Regression
Given a linear regression setting, Iterative Least Trimmed Squares (ILTS) involves alternating between (a) selecting the subset of samples with lowest current loss, and (b) re-fitting the linear model only on that subset. Both steps are very fast and simple. In this paper, we analyze ILTS in the setting of mixed linear regression with corruptions (MLR-C). We first establish deterministic conditions (on the features etc.) under which the ILTS iterate converges linearly to the closest mixture component. We also provide a global algorithm that uses ILTS as a subroutine, to fully solve mixed linear regressions with corruptions.